Finding Statistically Significant Attribute Interactions
نویسندگان
چکیده
In many data exploration tasks it is meaningful to identify groups of aribute interactions that are specic to a variable of interest. For instance, in a dataset where the aributes are medical markers and the variable of interest (class variable) is binary indicating presence/absence of disease, we would like to know which medical markers interact with respect to the binary class label. ese interactions are useful in several practical applications, for example, to gain insight into the structure of the data, in feature selection, and in data anonymisation. We present a novel method, based on statistical signicance testing, that can be used to verify if the data set has been created by a given factorised class-conditional joint distribution, where the distribution is parametrised by a partition of its aributes. Furthermore, we provide a method, named astrid, for automatically nding a partition of aributes describing the distribution that has generated the data. State-of-the-art classiers are utilised to capture the interactions present in the data by systematically breaking aribute interactions and observing the eect of this breaking on classier performance. We empirically demonstrate the utility of the proposed method with examples using real and synthetic data.
منابع مشابه
A Framework for Finding Statistically Significant Differences
In the questionnaire analysis, finding whether there is a statistically significant difference between two or more groups in a continuous measure is one of the major problems in researches. However, it is difficult for researchers to solve the issue of finding possible statistically significant difference, namely “Statistically Significant Difference Unawareness Issue”. There are two causes to ...
متن کاملAlgorithms for Efficient Mining of Statistically Significant Attribute Association Information
Knowledge of the association information between the attributes in a data set provides insight into the underlying structure of the data and explains the relationships (independence, synergy, redundancy) between the attributes and class (if present). Complex models learnt computationally from the data are more interpretable to a human analyst when such interdependencies are known. In this paper...
متن کاملAssociation between Tumor Necrosis Factor- α-308 G/A Polymorphism and Multiple Sclerosis: A Systematic Review and Meta-Analysis
Multiple sclerosis (MS) is a complex polygenic disease in which gene-environment interactions are important. A number of studies have investigated the association between tumor necrosis factor-α (TNF-α) -308 G/A polymorphism (substitution G→A, designated as TNF1 and TNF2) and MS susceptibility in different populations, but the results of individual studies have been inconsistent. Therefore, per...
متن کاملBreast cancer in first-degree relatives and risk of lung cancer: assessment of the existence of gene sex interactions.
BACKGROUND Previous studies have shown the sex differences in lung cancer and the associations between estrogen-related genes and non-small cell lung cancer. In the present study, we assumed the existence of shared candidate genes that are common in lung and breast cancers, and examined whether women with a family history of breast cancer are at increased risk of lung cancer compared with men, ...
متن کاملStudies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — In the Case of Critical Dataset Size —
STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains be...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1612.07597 شماره
صفحات -
تاریخ انتشار 2016